Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

DiCoMo: An Algorithm Based Method to Estimate Digitization Costs in Digital Libraries

Identifieur interne : 001316 ( Main/Exploration ); précédent : 001315; suivant : 001317

DiCoMo: An Algorithm Based Method to Estimate Digitization Costs in Digital Libraries

Auteurs : Alejandro Bia [Espagne] ; Jaime G Mez [Espagne]

Source :

RBID : ISTEX:903552E0AC429A3EDE9A54A2786AD96887283EBC

Descripteurs français

English descriptors

Abstract

Abstract: The estimate of web-content production costs is a very difficult task. It is difficult to make exact predictions due to the great quantity of unknown factors. However, digitization projects need to have a precise idea of the economic costs and times involved in the development of their contents. As it happens with software development projects, incorrect estimates give way to delays and costs overdrafts. Based on methods used in Software Engineering for software development cost prediction like COCOMO [1]) and Function Points [2], and using historical data gathered during five years of work at the Miguel de Cervantes Digital Library, where more than 12.000 books were digitized, we have refined an equation for digitization cost estimates named DiCoMo (Digitization Cost Model). This method can be adapted to different production processes, like the production of digital XML or HTML texts using scanning plus OCR and human proofreading, or the production of digital facsimiles (scanning images without OCR). The estimates done a priori are improved as the project evolves by means of adjustments based on real data obtained from previous stages of the production process. Each estimate is a refinement obtained as a result of the work done so far.

Url:
DOI: 10.1007/11551362_62


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">DiCoMo: An Algorithm Based Method to Estimate Digitization Costs in Digital Libraries</title>
<author>
<name sortKey="Bia, Alejandro" sort="Bia, Alejandro" uniqKey="Bia A" first="Alejandro" last="Bia">Alejandro Bia</name>
</author>
<author>
<name sortKey="G Mez, Jaime" sort="G Mez, Jaime" uniqKey="G Mez J" first="Jaime" last="G Mez">Jaime G Mez</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:903552E0AC429A3EDE9A54A2786AD96887283EBC</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11551362_62</idno>
<idno type="url">https://api.istex.fr/document/903552E0AC429A3EDE9A54A2786AD96887283EBC/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001313</idno>
<idno type="wicri:Area/Istex/Curation">001235</idno>
<idno type="wicri:Area/Istex/Checkpoint">000C25</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Bia A:dicomo:an:algorithm</idno>
<idno type="wicri:Area/Main/Merge">001352</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:05-0445900</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000435</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000352</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000416</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Bia A:dicomo:an:algorithm</idno>
<idno type="wicri:Area/Main/Merge">001446</idno>
<idno type="wicri:Area/Main/Curation">001316</idno>
<idno type="wicri:Area/Main/Exploration">001316</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">DiCoMo: An Algorithm Based Method to Estimate Digitization Costs in Digital Libraries</title>
<author>
<name sortKey="Bia, Alejandro" sort="Bia, Alejandro" uniqKey="Bia A" first="Alejandro" last="Bia">Alejandro Bia</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>Miguel Hernández University</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Espagne</country>
</affiliation>
</author>
<author>
<name sortKey="G Mez, Jaime" sort="G Mez, Jaime" uniqKey="G Mez J" first="Jaime" last="G Mez">Jaime G Mez</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>University of Alicante</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Espagne</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">903552E0AC429A3EDE9A54A2786AD96887283EBC</idno>
<idno type="DOI">10.1007/11551362_62</idno>
<idno type="ChapterID">62</idno>
<idno type="ChapterID">Chap62</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>A priori estimation</term>
<term>Character recognition</term>
<term>Cost analysis</term>
<term>Delay</term>
<term>Development cost</term>
<term>Digital image</term>
<term>Digitizing</term>
<term>Economics</term>
<term>Economy</term>
<term>Electronic library</term>
<term>Facsimile</term>
<term>HTML language</term>
<term>Internet</term>
<term>Modeling</term>
<term>Optical character recognition</term>
<term>Production cost</term>
<term>Production process</term>
<term>Refinement method</term>
<term>Software development</term>
<term>Software engineering</term>
<term>Text</term>
<term>World wide web</term>
<term>XML language</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>.</term>
<term>Analyse coût</term>
<term>Bibliothèque électronique</term>
<term>Coût développement</term>
<term>Coût production</term>
<term>Développement logiciel</term>
<term>Economie</term>
<term>Estimation a priori</term>
<term>Génie logiciel</term>
<term>Image numérique</term>
<term>Internet</term>
<term>Langage HTML</term>
<term>Langage XML</term>
<term>Modélisation</term>
<term>Méthode raffinement</term>
<term>Numérisation</term>
<term>Processus fabrication</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Retard</term>
<term>Réseau web</term>
<term>Sciences économiques</term>
<term>Texte</term>
<term>Télécopie</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Génie logiciel</term>
<term>Numérisation</term>
<term>Télécopie</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: The estimate of web-content production costs is a very difficult task. It is difficult to make exact predictions due to the great quantity of unknown factors. However, digitization projects need to have a precise idea of the economic costs and times involved in the development of their contents. As it happens with software development projects, incorrect estimates give way to delays and costs overdrafts. Based on methods used in Software Engineering for software development cost prediction like COCOMO [1]) and Function Points [2], and using historical data gathered during five years of work at the Miguel de Cervantes Digital Library, where more than 12.000 books were digitized, we have refined an equation for digitization cost estimates named DiCoMo (Digitization Cost Model). This method can be adapted to different production processes, like the production of digital XML or HTML texts using scanning plus OCR and human proofreading, or the production of digital facsimiles (scanning images without OCR). The estimates done a priori are improved as the project evolves by means of adjustments based on real data obtained from previous stages of the production process. Each estimate is a refinement obtained as a result of the work done so far.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Espagne</li>
</country>
</list>
<tree>
<country name="Espagne">
<noRegion>
<name sortKey="Bia, Alejandro" sort="Bia, Alejandro" uniqKey="Bia A" first="Alejandro" last="Bia">Alejandro Bia</name>
</noRegion>
<name sortKey="Bia, Alejandro" sort="Bia, Alejandro" uniqKey="Bia A" first="Alejandro" last="Bia">Alejandro Bia</name>
<name sortKey="G Mez, Jaime" sort="G Mez, Jaime" uniqKey="G Mez J" first="Jaime" last="G Mez">Jaime G Mez</name>
<name sortKey="G Mez, Jaime" sort="G Mez, Jaime" uniqKey="G Mez J" first="Jaime" last="G Mez">Jaime G Mez</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001316 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001316 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:903552E0AC429A3EDE9A54A2786AD96887283EBC
   |texte=   DiCoMo: An Algorithm Based Method to Estimate Digitization Costs in Digital Libraries
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024